Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

نویسندگان

  • Kristina Toutanova
  • Michel Galley
چکیده

Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate probability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming approach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixed Linear Regression with Multiple Components

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is locally strongly convex in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexi...

متن کامل

The IBM Mixture Models 1 and 2 for Word Alignment

This is a tutorial on the IBM models 1 and 2 for word alignment. In contrast to many other presentations, I motivate the models from a mixture model rather than from a translation perspective. This view makes it easier to derive the EM algorithms for learning and to understand why the likelihood function of the models usually has multiple optima.

متن کامل

Strict convexity of the free energy for non - convex gradient models at moderate β

We consider a gradient interface model on the lattice with interaction potential which is a non-convex perturbation of a convex potential. We show using a one-step multiple scale analysis the strict convexity of the surface tension at high temperature. This is an extension of Funaki and Spohn’s result [10], where the strict convexity of potential was crucial in their proof. AMS 2000 Subject Cla...

متن کامل

für Mathematik in den Naturwissenschaften Leipzig Strict convexity of the free energy for non - convex

We consider a gradient interface model on the lattice with interaction potential which is a non-convex perturbation of a convex potential. We show using a one-step multiple scale analysis the strict convexity of the surface tension at high temperature. This is an extension of Funaki and Spohn’s result [10], where the strict convexity of potential was crucial in their proof. AMS 2000 Subject Cla...

متن کامل

A Convex Alternative to IBM Model 2

The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011